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NOMBAS - A Bayesian Procedure for Selecting the Greatest Mean 

by 

Alan R. Washburn 



Introduction : Suppose that an experimenter must choose one category out of 

k after making a limited number of performance tests. The experimenter’s 
goal is to select the category with the greatest mean performance. The cate- 
gories could represent anything from competing aircraft designs to feed sup- 
plements; whatever the interpretation, the statistical problem is usually 
referred to as being one of "greatest mean selection". Several testing pro- 
cedures are available in the literature [2,6,8]. The purpose of this paper 
is to propose a new one (NOMBAS) and compare it with certain others. 

If the experimenter were to test each category a fixed number of times, 
he would typically discover at the end of testing that some of the categories 
have experimental means that are so small that he would regret having tested 
them so much. This suggests that substantial gains might be possible by using 
a sequential procedure wherein the category to be tested next and perhaps even 
the decision to stop testing depend on results achieved so far. This is what 
we have in mind. More precisely, NOMBAS is a procedure where at every stage 
the mean performance for each category is regarded as a normal random variable. 
Initial values for the mean and variance of the mean performance for each 
category must be provided by the experimenter. Whenever testing stops, the 
experimenter simply selects the category with the largest current mean. If 
testing is to be continued, the experimenter tests the category for which the 
expected gain from one more test is maximal; this procedure is "myopic" 
because there will typically be several tests yet to be made. If each test 
involves a normally distributed experimental error, then it is elementary to 
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apply Bayes* Theorem to obtain "revised” values for the mean and variance of 
the tested category, after which the procedure is repeated until finally the 
decision to stop testing is made. All this will be formalized below; our hope 
at this point is merely to have explained the source of the acronym NOrmal 
Myopic BAy es Sequential procedure. 

In making Bayesian calculations based on normal distributions, we are 
following [14]. The pervasive assumption of normality is perhaps not as re- 
strictive as it might seem at first sight. Recall that the experimenter * s pur- 
pose is to select the category with the greatest mean. If testing consists 
of making a sequence of independent observations, then it is inevitable that 
the choice of which category to select will be based on the experimental means 
of the observations for each category. By the Central Limit Theorem, the ex- 
perimental means themselves, being sums of independent random variables, tend 
to be normal even if the individual observations are not. So there is reason 
to hope that the NOMBAS procedure may be robust with respect to deviations from 
normality. This is one of the issues that will be explored numerically below, 
but first we will describe NOMBAS in more detail. 

The NOMBAS procedure 

Let e_^ be the mean performance of category i . For all i , we assume 

2 2 

that 0 . is normal with mean 0 . and variance a. . Let 0 .. and a.. 

1 10 10 1 J 1 J 

be the mean and variance of 0_^ given the results of the first j tests; 

j _> 1. If the j th test is made on category i , we assume that the observed 

result of that test is Z. = 0 . + W. , where W. is normal with mean 0 

J 1 J J 

2 

(this is no loss of generality) and known variance s _ , and independent of 
^l , *** , ^k* Wi,...,W i . By using either Bayes* Theorem or the update equations 
of a Kalman Filter [5], one can show that for the category tested and j ^ 1, 
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(1) 


0. . 
IJ 


= 0. . 1 + (i-p..) 
i»j-i ij 


(2) 


2 

a . . 
ij 


= p . . a? . , , where 

ij i»j-i 


(3) 


P ij 


2 2 , 2 V 

lj IJ l,j-l 



0 . . -. ) , and 

i.J-l 



2 2 

For any category i not tested on test j , 0.. = 0. and a.. - a. . 

Furthermore, conditional on the results of the first j tests, all of the 

0^ are normal and independent of each other. 

If the j th test is the last one, then NOMBAS selects category *, where 

max .0 . . 

J 1 iJ 



0^_. = max^0^. . If exactly one more test of category i ^ * were made, the 



gain from that test would be G^ - max(0, 0^ - 0^ ) , since the larger of 



9 i> j+1 



and 0. . would be selected after the test. Given the results of the 



first i tests, 0. . - 0, . is normal with mean 0.. - 0.. and variance 

i,j+i *j ij *j 

2 2 2 A 2 2 

(1 - p. (a.. + s. = a.. /(a.. + s. so the expected value of 

1,3+1 ij i,j+l 13 ij 1.3+1 

G . . is 
ij 



(4) 


8 ij 


5 E <v ■ ,i *<W> 


(5) 


a . . 
ij 


= al/ /o'. . + S 2 , and 

ij ij i,j+l 


(6) 


6. . 
ij 


= 0* . - 0 . . , and 

*J ij 


(7) 


F(y) 


00 

= / (x - y)d(}>(x) (see the 



Equations (4) - (7) also hold for i = *, provided 6 . . is taken to be the 
(non-negative) difference between the largest and second largest of the 



0 . . ; i = 1, . . .k. 
i.3 
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We now distinguish two versions of the NGMBAS procedure: NOMBASN makes 

exactly n tests, with test j being on the category for which g_ is 
largest. NOMBASG stops testing unless g ^ _> g > 0 , in which case the j th 
test is on category * . Each procedure has a parameter associated with it 
that determines when to stop; n in the case of NOMBASN and g in the case 
of NOMBASG. 

Selection of Competing Procedures 

Testing procedures for the greatest mean selection problem can be roughly 
categorized according to whether the number of tests performed is fixed or 
random, and also according to whether the order of testing is fixed or random. 
Let us adopt the notation RF for procedures where the number of tests is 
random but the order is (or could be) fixed, etc. An example of an FF pro- 
cedure is the procedure of testing each category a fixed number of times and 
then selecting the category with the largest experimental mean [1]. Examples 
of RF procedures are those of Bechhofer, Kiefer, and Sobel [2], and also 
Blumenthal [3]. NOMBASN is the only FR procedure known to the author. The 
procedures of Paulson [11] and Stein [13] each involve the idea of eliminating 
certain categories as testing proceeds; like NOMBASG, they are RR procedures. 
Since the RR procedures were expected to dominate the other classes, all 
three of the RR procedures were compared. The other two (there were five in 
total) were NOMBASN and the FF procedure called FIXED. We describe FIXED, 
PAULSON, and STEIN in detail below. The five procedures will be compared by 
showing how the Bayes risk depends on average sample number for each. Specifi- 
cally, let I be the index selected, let L = max. 0. - 0 T , and let N be 

li I 

the number of tests. Then E(L) is the Bayes risk and E (N) is the average 
sample number. 
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The FIXED procedure: 



In this scheme, the k categories are tested cyclically in the order 
1,2, . • . ,k,l, . • • . After a total of n tests, the category with the greatest 
experimental mean is selected, counting the experimental mean as 0 ^ q for any 
untested category. For n = km, where m is an integer representing the number 
of times each category is tested, a simple expression for E(L) can be deter- 
mined for the case where 0. is standard normal and s.. = s for all i, i 

1 ij 

as follows: Harter [7] has tabulated y 5 (average of the largest of k 

independent unit normals), so y^ is the best average gain achievable with 

2 

perfect knowledge. Since m observations with variance s are equivalent to 

2 

one observation with variance s /m , each category has variance 

2 2 2 2 2 2 
°i,km = (s /m)/(s /m + 1) = s /(s + m) 5 a associated with it after km 

observations, from (2) and (3). Since 0^ is standard normal and also normal 

2 

with mean 0. , and variance a , 0. . must be normal with mean 0 and 
l ,km l ,km 

2 

variance 1 - a . The expected value of the largest of the 0^ is 

— 

therefore p^/l ~ ° , and hence 



Formula (9) is consistent with the FIXED curve in Figure 1, with m = 1 
corresponding to E(N) = 10, etc. The FIXED curve was obtained by simulation, 
like all the others. 



( 8 ) 




For k - 10 and s = .5 , this reduces to 



(9) 



E (L) = 1.53875 (1 - /(m/(m + .25)) 
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The PAULSON procedure: 

Paulson’s [10] procedure irrevocably eliminates categories until only one 
is left, testing all surviving categories at each stage. After r stages, 
let Z. be the average of the r measurements that have been made on each 
category i that survived the first r-1 stages, and let be the largest 

of these. If Z. <Z u + A-a./r, category i is eliminated at the rth 

1 * A 

stage. The maximum number of stages is clearly a^/A rounded up to the next 

integer, since by then all categories except the largest have been eliminated. 

Paulson’s procedure has two parameters - A and a^. He shows in [10] that 

2 

if s.. = s for all i,j, and if a = [s /(A - A)] log((k - l)/a) , then 
ij A 

his procedure will select the category with the largest mean with probability 
at least 1 - a, provided the largest mean exceeds the next largest by at least 
A > 0, for any A in the interval (0, A). 

We take Paulson’s recommendation [11] and set A = (3/8) A. The procedure 
PAULSON has a = .1 , which leaves one parameter (A) free. E(L) increases with 
A and E(N) decreases with A ; the curves labelled PAULSON in Figures 1-3 were 
generated parametrically by varying A . Since PAULSON tests each category at 
least once, E(L) is not defined for E(N) < 10 in our examples. Limited testing 
with a ^ .1 did not reveal a significantly better value for a over the 
range of E(N) considered. 

The STEIN procedure: 

Reference [13] is reproduced in its entirety below. 

"Suppose X_^ j , i = 1,..., p;j = 1,2,... are independently normally distrib- 

2 

uted with means £. + n. and variances a. where £. , n . are unknown but 

1 J J i J 

2 

Qj are known. £, a are fixed numbers, with 0<€, 0<a<l. It is 
desired to select, by a sequential procedure, in which we take first the 
observations with second subscript 1, etc. an integer M among l,...,p 
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such that for every k = 1,..., p and satisfying 

£,=£. + £ for all p f k, P(M = k) = 1 - a. In accordance with the 
K- J 

following rule, one decides at each stage (after the observations with 
second subscript n) to take no more observations with certain first 
subscripts. For each n = 1 , 2 ,... and each Z = 1 ,..., p compute 



l a 



j=i 






- x. 

J 





where is the average of the observations with second subscript j 

and tj is the number of such observations. Continue taking observations 
for those Z for which this expression is greater than (£na)/€ 
but not for the others. Eventually there will be at most one subscript 
Z = l,...,p for which one continues to take observations and if there is 
one this is chosen to be M. If there is none, the Z for which the sum 
is largest is chosen to be M. This procedure is a straight-forward 
application of the Lemma on p. 146 of Wald’s SzqiiZYl&jClZ Analysis and 
generalizations can easily be found. 11 

In our case X. . = 0. + W, and r) . = 0 for all i, i . Stein’s procedure 
ij i J J F 

has two parameters — a and 0 . Our procedure STEIN is Stein’s with a = .1 ; 
this leaves € free to parametrically generate E(L) vs. E(N) . As in the case 
of PAULSON, limited testing did not reveal a significantly better value for a 
over the range of E(N) considered. 



Results 

Figure 1 shows E(L) vs. E(N) for the five competing procedures. In all 

cases k = 10, s., = .5 for all i, j, and 0. =0 and o = 1 for all i 

ij 10 io 

The random variables ^ and were generated as assumed by NOMBAS using 

the LLRANDOM random number generator [9]. Note that NOMBASN dominates FIXED 
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and that NOMBASG dominates all other procedures in this example. Results are 
based on 5000 replications in all cases; a 68% confidence interval is shown in 
the shape of an I for a set of points that is incomplete but hopefully large 
enough to indicate sampling variability without cluttering the figure. An 
additional run was made for a procedure called N0MBASG2 in which all random 
variables were generated as above but a ^ = 2 for all i . The curve for 
N0MBASG2 was indistinguishable from the curve for NOMBASG, indicating that the 
typical robustness of Bayesian procedures with respect to assumptions about the 
prior holds in this case. 

Figure 2 shows the effect of making the random variables 6^ exponential 
with mean 1, while setting = 1 in NOMBAS . The five procedures 

dominate each other in the same order as in Figure 1, except that STEIN is now 
better than NOMBASN. This is evidence that NOMBAS is robust with respect to 
the shape as well as the scale of the prior. 

Figure 3 shows a comparison of the five procedures in attempting to select 

the Poisson distribution with the greatest mean. The means of the 10 Poisson 

distributions were taken to be exponential with mean 4, while setting 

a. =0. = 4 in NOMBAS. Since the variance of a Poisson random variable is 

10 10 

the same as the mean, whereas NOMBAS assumes the parameter s_ to be given 
independently of the mean, there is clearly no logical way to determine s_ 
in this case. It was decided to set s _ = 2 for all i, j , on the grounds 
that the means are all "roughly" 4, and ^4 = 2. This thinking is imprecise, 
but that is really the point: NOMBAS appears to be robust with respect to 

problems of this type. Figure 3 shows that the order of dominance is as in 
Figure 2. 

One might at this point entertain the hypothesis that NOMBASN and NOMBASG 
are actually optimal: NOMBASN minimizing average loss within the class of 
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procedures where the number of tests is fixed, and NOME AS G minimizing average 
loss within the class where the number of tests is fixed on the average. These 
hypotheses are false. The next section documents a counterexample; it can 
be skipped without loss of continuity if the reader desires. 

NOMBAS is not optimal 

We first give an example showing that NOMBASN is not optimal when n = 2. 
Suppose k = 3, o. 0 = (>^2, 1/ \/2 , 0), 0^ = (0,1,1), and s_ = 1 for all i, j. 
The first category has a small mean and a large variance, the second has a 
large mean and a small variance, and the third should never be tested because 
= 0. Using (4) with 6^ = 1, S^q = °iq = 2//3, and o^q = 1/^6, we find 
that g^Q = .123 and g^Q = .162, so category 2 should be tested if n = 1, 
and would be the first category tested by NOMBASN in any case. Let 0^^ be 
the mean of given the results of this test, and let gC©^^) be the dif- 

ference (average gain from making the second test on category 1) - (average 
gain from making the second test on category 2). Then, since = 1//T2 , 



'g n - (1//12)F((1 - e 21 )/l2) if e 21 < 1 



( 10 ) 



g( e 21 ) = • 



(2//3)F(0 21 /3/2) - (1//I2)F((0 21 - 1)/12) if > 1 . 



Since F(*) is decreasing, the minimum of w hen 0^-^ £ 1 is g(l) , 

which is positive. S(^21^ a ^ so P os i t i ve f° r since it is 

asymptotically 0 and has a unique critical point (a maximum) at = 4/3. 

So NOMBASN will make the second test on category 1 regardless of the outcome 
of the first test on category 2. 

The procedure (call it P) that tests the categories in the order 1, 2 is 
equivalent to NOMBASN, since the two procedures do the same tests. Now consider 
the procedure P T that first tests 1 and then tests the category with the 
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largest gain. Since > a 2 q , P f will test 1 again if 0^ = 1, and 

will therefore test 1 again with positive probability. So P f is strictly 
better than NOMBASN, This establishes that NOMBASN is not optimal in general. 
Essentially the same example can be used to show the non-optimality of NOMBASG, 
since NOMBASG can be forced to make exactly two tests by selecting a gain cut- 
off g that is so small that at least two tests will be made, while simulta- 
neously assuming that s ^ is so large for j > 2 that at most two tests will 
be made. The possibility remains that NOMBASG might be optimal for the case 
where s_ does not depend on j , but NOMBASG is not optimal in general. 

Practical Considerations 

The fact that NOMBASG dominates all other procedures in the sense we have 
described is not necessarily conclusive, even for problems that closely resemble 
the example we have used. NOMBASG is Bayesian and sequential, so the usual 
arguments about Bayesian vs. traditional and sequential vs. non-sequential 
decision procedures apply. It is not our intention to resurrect those arguments 
here. However, NOMBAS has some unique difficulties that should be appreciated 
by anyone tempted to use it. 

NOMBAS makes tests one at a time. This is the source of its power, but 
it is also potentially a source of difficulty. Making tests in batches may 
have advantages in terms of speed, cost, or constancy of experimental conditions. 
Any of these factors could be decisive in a given application. However, we 
suggest that one class of applications where these factors are typically absent 
is in selection of the best of several large Monte-Carlo computer simulations; 
in fact, it was just such an application that suggested the NOMBAS procedure 
in the first place. In that application ten different Monte Carlo simulations 
(actually one computer program with ten different sets of gun parameters) were 
available of a defensive gun being attacked by a large number of attackers. 
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The intention was to select the gun that destroyed the greatest number of 
attackers before being overwhelmed, on the average. The process of writing and 
debugging the program provided the initial estimates required. 



A critical problem in the use of NOMBASG is the selection of the parameter 
g . It might be reasonable to ask the experimenter to estimate the amount of 
gain g T in the selected mean that would be just marginally worth the cost of 
a single test; i.e., the absolute slope of the E(L) vs E(N) curve at the desired 
E(N). Unfortunately, there is usually a great difference between g f and g . 

To obtain the point where E(N) = 30 in Figure 1, for example, it is necessary 

_8 

to take g = 1.3 x 10 . The absolute slope of the NOMBASG graph of E(L) vs 

-4 

E(N) at that point is g' = 5.2 x 10 . The great disparity between these two 

numbers is connected with the fact that the sequence max_^ g_ is typically 
not monotonically decreasing in j ; i.e., the fact that a large gain is not 
likely on the current trial does not rule out the possibility in the future. 
Unfortunately, this Explanation" provides no rule of thumb by which g might 
be obtained from g ' . Only a qualitative statement can be made: NOMBASG is 

remarkably reluctant to make tests, and therefore most experiments should be 
made with a remarkably small number g . The only redeeming feature is that 
NOMBASG is not very sensitive to g anyway; Figure 1 shows that changes of 
several orders of magnitude in g are required to increase E(N) from 30 to 40 
or decrease E(N) from 30 to 20. 

In many cases, the experimenter may have a rough idea of how many tests 
should be performed, as well as some possibly conflicting feelings about ac- 
ceptable terminal states. For such an experimenter we suggest the following 
N0MBAS procedure, which capitalizes on the fact that NOMBASN and NOMBASG make 
tests in the same order, and that the Bayes 1 calculations (1) - (3) are valid 
even if the tests are not performed in NOMBAS order. 

1. Make the required estimates of 0. , o. , and s.. ; i = l,...,k , 

j _> 1. Typically, s^_ will not depend on j . 
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2. Perforin a small number j of tests. These tests could be made in 



NOMBAS order, or, in case the idea of being "fair" to all categories 
is important, they could be spread evenly over the categories. Use 
equations (1) - (3) for each test and also (4) - (7) if NOMBAS order 
is used. Calculate 0 , , and g_ ; i = l,...,k. 

3. Examine the calculations to determine whether testing should be con- 
tinued. The runners-up to the largest of the g_^ should not be ignored 
(as NOMBAS does) ; the presence of close runners-up is a motive for 

continuation. The fact that 0.. and a., have well defined meanings 

ij ij 

should be an aid in making the decision. If no further testing is 
appropriate, select the largest of the 0.. . Otherwise, return to 



step 2. 

The above procedure is intended to be a compromise between NOMBASN and NOMBASG, 
and is probably somewhere between them in effectiveness. 

The fact that NOMBAS is a Bayesian procedure has some practical advantages. 
Suppose that category * were revealed to be best after a limited amount of 
testing. This might cause a closer examination of category * , and it might turn 
out that category * was tested incorrectly — an error in coding might be 
the reason if * were a computer simulation. If the other categories were 
not in error, then the experiment could be continued by correcting the error 
in * , resetting 0,. and o, . to 0, and o, , and then continuing to make 

Xj *j Xq XQ 5 <=> 

tests in NOMBAS order. The testing already done on non- * categories would 
not have to be wasted by starting the whole experiment over, and the experiment 
could be continued using the originally intended logic. 

Finally, and to the extent that general conclusions are justified by 
experiments such as those we have described: 

1. If the number of tests must be fixed, then NOMBASN is substantially 
better than FIXED. 
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2. If a sequential experiment is acceptable, and if NOMBAS is rejected 
on account of its Bayesian origins, then PAULSON is better than 
STEIN. 

The function F(y ) 

It is not difficult to show that the function F(y) defined in (4) can be 
expressed as 

oo 

(11) F (y) = / (x - y) d$(x) = <Ky) - y(l - 4>(y)) , 

y 

since the right and left-hand sides are both asymptotically 0 and have the 
same derivative with respect to y. Since the cumulative normal function $(y) 

is widely tabulated, this provides a ready means of evaluation. However, for 

large y the right-hand side of (11) is the difference of two small and very 
nearly equal quantities, which is numerically unfortunate. To get around this 
difficulty, write (11) as 

(12) F (y) = <Ky) (1 - yR(y)) > 

where R(y) = (1 - $(y))/<J>(y) is Mill's ratio. Mill's ratio satisfies the 
following inequality [12] : 

(13) 2/ (y 4- + 2b^) <■ R(y) <_ 2/ (y + / y ^ + 2b , 

where b = 4 /tt and b =2. Let 

o 00 

(14) b(y) - ( 8 /tt + 2. 36y + y 2 )/(2 + .5(2.36y + y 2 )) 
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Then b(o) = b and b(°°) = b^ regardless of the parameter that is 2.36 
in (14) , which means that the function 

(15) R(y) = 2/(y + /y 2 + 2b (y)) 

is a good approximation to R(y) for large and small y. The parameter that 
is 2.36 was selected to give a good fit over the midrange, and the function 

(16) F(y) s <j> (y) (1 - y R(y)) 

was used as an approximation to F(y) in all computations reported here. 

Some algebra shows that 

(17) F (y) = 2<f> (y) b(y)/(y + /y 2 + 2b(y)) 2 , 

which eliminates the need to take the difference of two small and nearly equal 
quantities. The difference | F (y) - F(y)|/F(y) never exceeds .003 . Given 
the apparent robustness of NOMBAS , it is likely that simpler approximations 
to F(y) than (17) would be adequate. 
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Figure 1: Selecting the largest of ten normally distributed means of 

normal random variables. 
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Figure 2: Selecting the largest of ten exponentially distributed means 

of normal random variables. 
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Figure 3: Selecting the largest of ten exponentially distributed means 

of Poisson random variables. 
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